--- redirect_from: - "/02/04-making-good-viz-happen" interact_link: content/02/04_Making_Good_Viz_Happen.ipynb kernel_name: python3 kernel_path: content/02 has_widgets: false title: |- Better Viz pagenum: 14 prev_page: url: /02/03a_practice.html next_page: url: suffix: .ipynb search: practice chart com data good want last bar y media giphy graph figure figures points units improve making viz makes well links lecture pdf charts x plot gif not show next above html also histogram buildings work fix n get page set through graphs below here donts www biostat wisc edu kbroman presentations iowastate graphscombined dont misleading almost another example text sometimes axis youre using bad story lets before org article great type line game basketball season team apartment variable density note suppose create large its things issue useful easy variables start our class code ipynb github plotly references new comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /content***" ---
Better Viz

Making Good Viz

There is an enormous amount of scholarship and debate about what makes for effective graphs and I can't possibly do the field justice. Below is simply one person's distillation of some tips that are reasonably well agreed upon. I'm aiming for concise here so that we can practice, but if you want more, visit the links below and links in the last lecture.

Don'ts

  • pie charts: humans stink at interpreting angles
  • stacked bar charts: tough to decode trends
  • make your reader do math: if $x-y$ is interesting, don't plot $x$ and $y$ separately
  • misleading scales
  • 3D unless absolutely necessary (and it almost surely isn't)
  • distracting chart junk
  • unnecessary colors

An illustration of some of those Don'ts in practice:

Another example to not replicate:

Do's: slides 49-64

  • Show the data, reduce the clutter, and integrate the text and the graph
    • graphs should aspire to be sufficient to understand without reading the text
  • Control the aspect ratio
  • Think about whether you need to include zero. Sometimes excluding it makes the figure misleading. Sometimes including it (and expanding the y-axis to do so) can hide the variation you're describing.
  • Facilitate comparisons:
    • by placing figure components next to or above (depends!) the stuff it is compared to
    • by using the same axis (two y-axes is usually bad!)
    • labels > legends! (so readers eyes don't have to dart back and forth)
    • sort in meaningful orders (i.e. not alphabetically!)

Transforming bad figures to good ones

Practice: Thinking and planning

TSP: Which type of graph (bar, line, or histogram) would you use for:

  1. The volume of apples picked at an orchard based on the type of apple (Granny Smith, Fuji, etcetera).
  2. The number of points for each game in a basketball season for a team.
  3. The count of apartment buildings in Chicago by the number of individual units.
  1. This is a nominal categorical example, and hence, a pretty straightforward bar graph target.
  2. This is a (nearly) continuous variable with 82 games. A basketball game team can score between 50 and 150 points, too much for a bar chart; a line chart throughout the season is a good way to go. A histogram could also work. Or a boxplot. Or a density graph.
  3. Density chart would work, but you could also use a histogram as long as you "bin" apartment buildings (<10 units, 10-50 units, etc...) Note that this variable will be skewed because only a few buildings have 500+ units.
  1. Suppose we create a scatter plot but find that due to the large number of points it's hard to interpret. What are two things we can do to fix this issue?
  2. Suppose that we create an n by n FacetGrid. How big can n get?
  3. What are the two things about faceting which make it appealing?
  4. When is pairplot most useful?
  1. One way to fix this issue would be to sample the points. Another way to fix it would be to use a hex plot.
  2. 5x5 is probably as large as you want to go.
  3. It's a easy way to show info about additional variables of interest to a figure.
  4. Especially useful when you're exploring the dataset.

Practice: Fixer Upper

The usual process is to start making figures that are simple and then iterating to improve them. Naturally, almost all figures start, well, not great.

Work with the classmate next to you and improve our "first pass" figures we started last class.

Use the guidelines above.

My turn: Oh the possibilities

In the last lecture page, I introduced a larger set of firm accounting variables. I want to show you how far we can push this.

If you want to see the code that makes these, view the raw ipynb file on GitHub The code uses plotly's subpackage plotly-express which is ridiculously easy to use.

One more

This is a replication of a famous Hans Rosling TED talk figure using the well-known gapminder data:

Before next class

  1. Improve all of the plots in the Visualization Practice page.
  2. Flip through the links above and the references in the Making Viz page. Make note of any neat chart types or alterations that improve charts that you would like to implement sometime.
  3. Now that we have most of our toolkit in place, read What I do when I get a new data set as told through tweets.

References

See the last lecture.